Acoustic-to-phonetic mapping using recurrent neural networks
نویسندگان
چکیده
This paper describes the application of artificial neural networks to acoustic-to-phonetic mapping. The experiments described are typical of problems in speech recognition in which the temporal nature of the input sequence is critical. The specific task considered is that of mapping formant contours to the corresponding CVC' syllable. We performed experiments on formant data extracted from the acoustic speech signal spoken at two different tempos (slow and normal) using networks based on the Elman simple recurrent network model. Our results show that the Elman networks used in these experiments were successful in performing the acoustic-to-phonetic mapping from formant contours. Consequently, we demonstrate that relatively simple networks, readily trained using standard backpropagation techniques, are capable of initial and final consonant discrimination and vowel identification for variable speech rates.
منابع مشابه
Text-to-speech conversion with neural networks: a recurrent TDNN approach
This paper describes the design of a neural network that performs the phonetic-to-acoustic mapping in a speech synthesis system. The use of a time-domain neural network architecture limits discontinuities that occur at phone boundaries. Recurrent data input also helps smooth the output parameter tracks. Independent testing has demonstrated that the voice quality produced by this system compares...
متن کاملAreal and Phylogenetic Features for Multilingual Speech Synthesis
We introduce phylogenetic and areal language features to the domain of multilingual text-to-speech synthesis. Intuitively, enriching the existing universal phonetic features with crosslingual shared representations should benefit the multilingual acoustic models and help to address issues like data scarcity for low-resource languages. We investigate these representations using the acoustic mode...
متن کاملSpeech Synthesis with Neural Networks
Text-to-speech conversion has traditionally been performed either by concatenating short samples of speech or by using rule-based systems to convert a phonetic representation of speech into an acoustic representation, which is then converted into speech. This paper describes a system that uses a time-delay neural network (TDNN) to perform this phonetic-to-acoustic mapping, with another neural n...
متن کاملMining Speech Sounds,Machine Learning Methods for Automatic Speech Recognition and Analysis
This thesis collects studies on machine learning methods applied to speech technology and speech research problems. The six research papers included in this thesis are organised in three main areas. The first group of studies were carried out within the European project Synface. The aim was to develop a low latency phonetic recogniser to drive the articulatory movements of a computer-generated ...
متن کاملAcoustic-phonetic decoding based on elman predictive neural networks
In this paper we present a phoneme recognition system based on the Elman predictive neural networks. The recurrent neural networks are used to predict the observation vectors of speech frames. Recognition of phonemes is done using the prediction error as distortion measure in the Viterbi algorithm. The performance of the neural predictive networks is evaluated on both the training database and ...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید
ثبت ناماگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید
ورودعنوان ژورنال:
- IEEE transactions on neural networks
دوره 5 4 شماره
صفحات -
تاریخ انتشار 1994